Syndicate: A Scalable, Read/Write File Store for Edge Applications
نویسندگان
چکیده
Applications in middle and last mile networks (the " edge ") that make use of local storage, network caches, and datacenter storage must consider availability, durability , performance, cost, and consistency requirements in their design. We present Syndicate, a wide-area read/write file store that addresses these concerns. To illustrate them, suppose a physicist PI and her collaborators host experimental data on her university's file server. As research progresses, she discovers she needs higher data availability and durability, so she uploads the important datasets to cloud storage. While her collaborators may still read and edit them, she now pays for hosting and data transfer. Later, the group starts to use off-site grid computers to regularly download and process the data. To decrease latency and transfer costs and increase bandwidth and availability, the PI employs network caches to scale up the number of concurrent downloads. However, depending on caching policy, remote readers may get stale data well after a modification, causing the collaborators to suffer invalid results. This example offers four key insights. First, using network caches for remote reads improves availability and amortized performance regardless of where the data is hosted. By using network caches, the collaborators may store their data wherever is best for them; only cache misses affect read performance. Second, durability only needs to be considered on writes. When a collaborator commits new experimental data, he chooses how many replicas to make, and where to put them, to achieve a desired durability. This choice is specific to the dataset, and is a matter of policy (e.g. durability, cost, etc.), not implementation. Third, remote readers cannot rely on caches for consistency. Even though many caches today offer support for object TTLs and refresh requests (i.e. via HTTP directives), the cache has its own cost and performance objectives , and may choose to ignore directives to meet them. For example, a cache could keep an object resident beyond its TTL to reduce the cost and performance penalties of frequent revalidation. Moreover, because caches can be transparent, the collaborators and grid computers can neither reliably control nor predict caching policy. Fourth, widely-deployed storage and caching infrastructure are almost good enough. Rather than modifying the infrastructure, the collaborators address these concerns out-of-band (e.g. storage conventions on a wiki). From these insights, we derive Syndicate. Syndicate organizes data into a filesystem (a Volume) and addresses consistency by treating each version of each block …
منابع مشابه
Dripcast - Architecture and Implementation of Server-less Java Programming Framework for Billions of IoT Devices
We propose “Dripcast,” a new server-less Java programming framework for billions of IoT (Internet of Things) devices. The framework makes it easy to develop device applications working with a cloud, that is, scalable computing resources on the Internet. The framework consists of two key technologies; (1) transparent remote procedure call (2) mechanism to read, write and process Java objects wit...
متن کاملConsistent Join Queries in Cloud Data Stores
NoSQL Cloud data stores provide scalability and high availability properties for web applications, but do not support complex queries such as joins. Developers must therefore design their programs according to the peculiarities of NoSQL data stores rather than established software engineering practice. This results in complex and error-prone code, especially when it comes to subtle issues such ...
متن کاملFile System Performance and Transaction Support
This thesis considers two related issues: the impact of disk layout on file system throughput and the integration of transaction support in file systems. Historic file system designs have optimized for reading, as read throughput was the I/O performance bottleneck. Since increasing main-memory cache sizes effectively reduce disk read traffic [BAKER91], disk write performance has become the I/O ...
متن کاملLecture Notes for CS347: Operating Systems
• A file is a way to permanently store user and kernel code and data, typically on non-volatile storage like a hard disk. Secondary storage disks or hard disks are block devices that store contents of files across several blocks. In addition to the actual content of the file on disk, a file has several metadata associated with it: a name, location in a directory structure, location (i.e., addre...
متن کاملDesign of a Write-Optimized Data Store
The WriteBuffer (WB) Tree is a new write-optimized data structure that can be used to implement per-node storage in unordered key-value stores. The WB Tree provides faster writes than the Log-Structured Merge (LSM) Tree that is used in many current high-performance key-value stores. It achieves this by replacing compactions in LSM Trees, which are I/O-intensive, with light-weight spills and spl...
متن کامل